180        Bioinformatics

5.3.7  Using EdgeR for Differential Analysis

EdgeR (Empirical Analysis of Digital Gene Expression Data in R) is an R Bioconductor

package for differential expression analysis of RNA-Seq data. It performs differential

expression of replicated count data using generalized linear model for the over-dispersed

count data and the models account for both biological and technical variability. EdgeR

uses negative binomial distribution to model the RNA-Seq count data.

Assume that for each sample i, the total number of reads (library size) is Ni,

g

φ is the

dispersion coefficient, and pgi is the relative abundance of gene g in the experimental group

i. The mean and variance are estimated as follows:

N p

gi

i

gi

µ =

(5.23)

gi

gi

g

µ

µ φ

(

)

=

+

variance

1

 

(5.24)

For differential expression analysis using the negative binomial regression, the parameters

of interest are the relative abundance of each gene (pgi).

As we have discussed above, the negative binomial distribution changes to the Poisson

distribution when the count data is not dispersed ( g

φ = 0) or to the quasi-Poisson distribu-

tion if the variance is linearly correlated to the mean. EdgeR estimates the dispersion ( g

φ )

as the coefficient of variation (CV) of biological variation between the samples. Dispersion

means biological coefficient of variation (BCV) squared that is estimated by dividing

Formula (5.24) by

gi

µ2.

CV

gi

g

µ

φ

=

+

1/

 

2

(5.25)

EdgeR calculates the common dispersion for all genes and it can also calculate gene-wise

dispersions and then it shrinks them toward a consensus value. Differential expression is

then assessed for each gene using an exact test for over-dispersed data [33].

In the following, we will analyze the non-normalized count data obtained by HTSeq-

count program in the previous step and saved as “htcount.txt” file in the “features” direc-

tory. The analysis will be carried out in R. Therefore, R must be installed on your computer.

The instructions of R installation are available at “https://cran.r-project.org/”. You can also

use R on Anaconda as well. We assume that you have R installed on your computer and it

is running. On R, you will also need to install Limma and EdgeR Bioconductor packages

by following the installation instructions available at “https://bioconductor.org/packages/

release/bioc/html/edgeR.html” to install EdgeR and “https://bioconductor.org/packages/

release/bioc/html/limma.html” to install limma. For the current versions, open R, and on

the R shell, run the following:

if (!require(“BiocManager”, quietly = TRUE))

install.packages(“BiocManager”)

BiocManager::install(“edgeR”)

BiocManager::install(“limma”)